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Abstract 

We introduce and study covariance fields of distributions on a Rie- 
mannian manifold. At each point on the manifold, covariance is de- 
fined to be a symmetric and positive definite (2,0)-tensor. Its product 
with the metric tensor specifies a linear operator on the respected tan- 
gent space. Collectively, these operators form a covariance operator 
field. We show that, in most circumstances, covariance fields are con- 
tinuous. We also solve the inverse problem: recovering distribution 
from a covariance field. Surprisingly, this is not possible on Euclidean 
spaces. On non-Euclidean manifolds however, covariance fields are 
true distribution representations. 



1 Preliminaries 

Subject of this study are random variables on Riemannian manifolds. 
For the sake of clarity and self-consistency we will briefly recall the main 
notations and facts from Riemannian geometry we are going to use. For a 
comprehensive introduction the reader is suggested to refer to [I], [11] or [5]. 

1.1 Riemannian manifolds 

Let M be a n-manifold with differentiable structure given as a collection of 
charts (U a , x Q ) where U a are open sets in W 1 and x Q : U a — > M are injective. 
For p e Xa(U a ), (U a , Xq) is called a parametrization or system of coordinates 
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at p. Thus, when we say coordinates a; at a point of M, we will understand 
a local system of coordinates given by a chart (U, x). 

With M p we denote the tangent space at p G M. The tangent bundle 
TM on M is given by TM = {{p,v)\p G M,v G M p }. It is a 2n-manifold. 
The map it : TM — > M, n(p, v ) = p denotes the natural projection. 

Recall that if Mi and M 2 are two manifolds and : Mi —>■ M 2 is a differ- 
entiate mapping, differential of 4> at p G Mi (also called a push-forward) is 
a linear mapping c?0 p : {M\) p — > (M 2 )0( p ) given by d(f> p (v)(f) = v(f o 0) for 
any v G {Mi) p and / G C°°(Mi), a differentiate function on M 2 . If d(p p is 
an isomorphism, then is a local diffeomorphism at p (Theorem 2.10 in [1]). 

With respect to a parametrization (£/, x) and p = x(xi, x n ) G x(C/), 
the tangent space M p of M at p has canonical basis {^-| p }" =1 . Let v G M p , 

then ^ = (vl, ■■■,v™) G M n , such that v = Y^=i v x~£r\pi ^ s tne vec toi' of 
components of v with respect to coordinates x. A Riemannian structure g 
on M defines an inner product < ., . > p on M p such that gij(x\, ...,x n ) =< 
~j^r.\pi ~£r\p > p are different iable functions on U. At each p G x(f), the n x n 
symmetric and positive definite matrix G x = {gij(x)} is called a coordinate 
representation of the metric at p. If y is another local system of coordinates 
at p and A = {^\ p }^ ,- =1 , is the Jacobian of the change at p, which is a non- 
singular matrix, then component tangent vectors and metric representations 
change according to 

v y = Av x , (1) 

G y = {A- l )'G x A-\ (2) 

Any Riemmanian manifold can be endowed with a natural measure called 
volume measure. Let x be local coordinates at p G M, the volume measure 
with respect to x is defined by 

dV{x) := (dV(p)) x = ^det(G x )dx, 

where dx is the Lebesgue measure in M n . One easily checks that a change of 
local coordinates at p from x to y, does not change the expression for dV(p), 
dV{x) = dV{y). 

Throughout this paper we will assume that M is a Riemannian manifold 
of dimension n. 

1.2 Exponential map and its inverse 

Geodesies on M are defined as solutions of first order system of differential 
equations, called geodesic equations (3.2 in [1]). In local chart (U,x) they 
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are 



dx k 



dy k 



(3) 



dt 



= Vk, 



ot 



where Y k , are different iable functions in U. The theory of ordinary differential 
equations says that for any (xi,yi) G U x MJ 1 , there exists a neighborhood 
W of and e > such that for any (xo,yo) G W, (J3J) has a unique 

solution t i— > c(t) for |t| < e satisfying c(0) = xo and c'(0) = yo. Moreover, 
c(t) depends differentially on the initial conditions. 

For q G M and v G M g let 7(t, q,v), t G (— e, e) be a geodesic on M such 
that 7(0, = g and j'(0,q,v) = v. Thus for x(t) = x _1 oj(t,q,v) and 
?/(t) = (x _1 o j)'(t, q, v), (x(t),y(t)) is a solution of the system (j3J). 

For any p G M, there is a set U C TM, p G U and e > 0, such that 
V(g, t>) G W, 7(t,g,w) is well defined and different iable function of (t,q,v) in 
(— e, e + 1) x U. Let g = j(l,q,v) and C = j'(l,q, v). Then it follows from 
(ED that (g, -£) G U and for t G (-e, e + 1) 



It is a different iable map on U. 

For any p G M, there is a maximal neighborhood V(p) of the origin in 
M p where exp p is a diffeomorphism; U(p) = exp p (V(p)) is called maximal 



is an isomorphism. One checks that (dexp )o{v) = 7'(0,p, v) = v and 



l(t,q, -v) = 7(1 - t,q,v) 



(4) 



j'(t,q, -v) = -Y(l - 
The exponential map, exp : W — > M, is defined by 



exp q (v) = exp(g, v) = 7(1, g, v). 




(dexp p )„ : M p -> M £ 



(dexp) v (v) = i(\,p,v). 



(5) 
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Indeed, 

(dexp p ) v (v) = ^exp p ((t+l)v)\ t=0 = ^pi^P, (t+l)v)\ t=0 = ^(t+l,p,v)\ t=0 . 

By the Gauss lemma (3.5 in [4]), (dexp p ) v also satisfies 

< (dexp p ) v (v ), (dexp p ) v (w) >=< v,w >, 

for any w G V{p). 

Let p <E U(q) and g G then by applying ([5]) and (j4j we find 

-exp~ 1 p= ^'(Oj^exp^V) = 
7'(l,p,exp; 1 g) = (o?exp p ) exp -i ? (exp- 1 g). 
Therefore, for a fixed p, we have following expression of exp -1 p : £/(j>) — > TM 

exp~ 1 p= -7'(l,p, Ooexp" 1 . (6) 

Since 7 ; (l,p, .) is differentiable in V(p) and exp" 1 is a diffeomorphism in 
exp _1 p is differentiable in U(p). 
The map q i— > exp" 1 p is differentiable in £/ (p) in the following sense. If 
x are local coordinates at q G £7(p), <? = x(x l5 ...,x n ), then the components 
of (exp" 1 ^)^. G lR n are differentiable functions of x. Moreover, we show 
Lemma 1 For q = x(xi, ...,x n ) G U(p), the symmetric matrix 

Z x (q,p) = (ex-Pq 1 P)x(exp~ 1 p)' x (7) 

is differentiable in x. 

A change of local coordinates from x to y with Jacobian A at q, changes 
coordinate expression of Z(q,p) according to 

Z y (q,p) = AZ x (q,p)A'. (8) 

We will adopt, for brevity, following notation for the inverse exponential 
map 

qp := exp~ l p, 

in analogy to the Euclidean case where, exp~ l p = p — q = qp, for p, q G M n . 
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1.3 Tensors and tensor fields 

Let V be a n-dimensional vector space and V* be its dual space of linear 
functions on V. Let also a; be a basis of V and x be its dual basis. For v G V 
with v x G M n we denote the column vector of components of v, v — Yli v x x i- 
While for a co- vector w G V* with w x we denote the row vector of components 

of W, w' x G M. n , W = J2i W x%i- 

Co-variant 2-tensor T on V is a bi-linear function T : V x V — * M, 
which with respect to the basis x is represented by a matrix T x . Coordinate 
expression for T is 

T(u,v) = u' x T x v x ,Wu,v G V. 

With T 2 (V) we denote the vector space of co- variant 2-tensors on V. 

Similarly, contra- variant 2-tensor W is a bi-linear function W : V* xV* ^ 
K. with a coordinate expression W x with respect to x 

W(u,v) = u x W x v' x , Vu, v G 1/*. 

With T 2 (V) we denote the vector space of contra- variant 2-tensors on V. 

Let T G T 2 (V) and W G T2(V). The contraction TW of their tensor 
product T <g> W is & (1,1) tensor with coordinates expression T X W X with 
respect to x. We denote G T±(V). 

Let y be another basis on V such that y = Ax for a non-singular matrix 
A. Coordinate expressions for T, W and TW change according to 

T y = {A- x )'T x A-\ 

W y = AW X A\ 

and 

(TW)y = (A~ l )'(TW) x A'. 

Recall that two matrices C and D are called congruent if there exists a non- 
singular matrix P such that C = PDP' and are called similar if C = PDP^ 1 . 
Looking back at the change of coordinates rules we may conclude that the 
coordinate representations of (2,0) and (0,2) tensors are congruent, while 
those of (1,1) tensors are similar. 

Let now M be a Riemmanian manifold with metric structure G. The 
expression ([2]) for the change of the metric representation at a point of M is 
co-variant like. Therefore, at each point p G M the metric is a symmetric 
and positive definite co-variant 2-tensor G(p) on the tangent space M p ; a 
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fact that we notate with G(p) G T 2 (M P ). Globally, G is a co- variant 2-tensor 
field which is differentiable in the following sense. If X, Y G TM are two 
differentiable vector fields on M, then G(X, Y) is a differentiable function 
on M. With T 2 (M) (7% (M)) we denote the differentiable co- variant (contra- 
variant) 2-tensor fields on M. We write 

G G T 2 (M). 

Now we return back to Z(q,p) which is given by Z(q,p) = (qp)(qp)', 
wherever the qp = exp" 1 p is defined. The change rule (jSJ) for it is a contra- 
variant like and thus, Z(q,p) is a symmetric and non-negative definite contra- 
variant 2-tensor at M q . Moreover, by lemma HJ for any fixed p G M, Z(.,p) 
is a differentiable contra-variant tensor field on U(p) - a fact that we write 
as 

Z(.,p)ET 2 (U(p)). (9) 

Moreover, for every q G U(p), G(q)Z(q,p) G Tl(M q ) and G(.)Z(.,p) is a 
differentiable (l,l)-tensor field on U(p), i.e. 

G(.)Z(.,p)eT'(U(p)). (10) 

1.4 Linear operators on tangent spaces 

Linear operator on vector space V is any L : V — > V such that 

L(avi + (3v 2 ) = aL(i>i) + (3L(v 2 ), 

for any two V\,v 2 G V and a, (3 G E. With respect to a basis x, L is 
represented by a matrix L x and then L(v) = L x v x . Let y be another basis 
such that y = Ax, then 

L(v) = LyVy = L y Av x = (L(v)) y = A(L(v)) x = AL x v x . 

Therefore 

L y = AL X A , 

which correspond to the change of coordinates rule for (l,l)-tensors. 

Return back to Riemannian manifold setting. Let p G M and q G U(p), 
then G(q)Z(q,p) defines a linear operator on M q . In local coordinates x at 
q it is defined as 

Vx ^ (G x Z x )'v x = Z x G x v x , v G M q . 
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If v , w e M q 

< w, (GZ)(v) >= w' x G x Z x G x v x . 

In particular, 

< v, (GZ)(v) >= v x G x Z x G x v x = ((exp" 1 p)' x G x v x )'((expg 1 p)' x G x v x ) > 0, 

for v 7^ 0. 

We summarize in the following 
Lemma 2 For any fixed p G M, G(.)Z(.,p) is differentiable field of linear 
operators on U(p) and thus, if X is differentiable vector field on U(p), then 
(G(.)Z(.,p))(X) is also a differentiable vector field on U(p). 

1.5 Distributions on Riemannian manifolds 

Let M be a Riemannian manifold and (U a , x a ) are charts of M. Open sets 
~X-a(Ua) in M generate a er-algebra on M which we will denote with A(M). 
One easily verifies that the volume measure, V, is a measure on the cx-algebra 
A(M) and thus, (M, A(M),V) is a measure space. 

A random variable X on M is any measurable function from a probability 
space £>, V) to (M, A, V). The distribution function F of X is defined as 

F(A) = V(X^(A)), A e A(M). 

F is a countably additive and satisfies F > 0, F($) = and F(M) = 1. 
Definition 1 A distribution F on M is said to be absolute continuous with 
respect to the volume measure if F(A) = J.dF(p),VA G A(M), where dF(p) 
is given by 

dF(p) = f(p)dV(p), 
for a A(M) -measurable function f. We say that f is density (pdf) of F. 

In the above definition, a density function / is measurable in sense that 
f~ 1 {B) G A(M) for every Borel set B in M. The density / is continuous 
everywhere on M except eventually a set of volume measure zero. 

In this work we will also consider discreet distributions on M. They are 
not absolute continuous w.r.t. V and instead of density have probability mass 
function (pmf). 
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2 Covariance fields 



2.1 Definition 

Definition 2 Covariance field of probability distribution F is a contra-variant 
positive definite 2-tensor field E on M, given by 

q ^ E(q) = / (qp)(qv)'dF(p), 

JU(q) 

where U(q) is the maximal normal neighborhood of q. 
In the notation of ([7]), the covariance of F at q is 

E(g) = / Z(q,p)dF(p)eT 2 (M q ) 

JU(q) 

because Z(q,p) G T 2 (M q ). At this stage, we do not claim that E is differen- 
tiable not even continuous field on M or on an open subset of M. 

In local coordinates x, Z x (q,p) is a symmetric non- negative matrix and 
therefore S x (g) = fu( q \ Z x (q,p)dF(p) is symmetric and non-negative definite. 
In fact, ^ x (q) will be positive definite, except the cases when the support of 
F in x coordinates is a hyperplane in M n . We ignore these cases, which obvi- 
ously are caused by ill defined distributions, and assume positive definiteness 
of Tj x (q). Correspondingly, for the contra- variant tensor E(g), we assume 
symmetry and positive definiteness. The space of symmetric and positive 
definite matrices (tensors) we denote with Sym^. 

At point q G M, we consider the product GS(g) := G(g)E(g) of G(q) G 
T 2 (M q ) and E(g) G T 2 (M q ). It is a linear operator in M q , i.e. 

GE(q)eTl(M q ). 

Let G x and E x are representations of G(q) and E(g) with respect to coordi- 
nates x about q. Then 

< w, (GE(q))(v) >= w' x G x T, x G x v x , 

for v , w G M q . Moreover, (GE(g)) -1 is also a linear operator in M q and 

< U ;,(GE(g))- 1 (t;) >=w'jl x 1 v xi 
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If F(U(q)) = 1, then 



tr(G(q)E(q)) = [ (qp)' x G x (q)(q$) x dF(p) = 

JU(q) 

[ \\Log q p\\ 2 dF(p), 

J M 

Finally, for the intrinsic mean /i, which is the Frechet mean of F on the metric 
space M equipped with the geodesic distance, we obtain 

fx = argmin q tr(G'E(q)), 

in agreement to a well known fact about mean of distributions in W L . 

After this illustrating example we are motivated to give a more general 
definition of expectation. 

Definition 3 Let F be a distribution on M, q G M, TW G T^(M q ) and h 
be a linear operator on T^(M q ). Then the expectation of h(TW) is defined 
to be 

E(h(TW)) = [ h(TW(p))dF(p). 

J M 

As shown above, this is applicable for h(A) = tr(A) and TW = G(q)Z(q,p). 
Then GE(q) = E(tr(G(q)Z(q,p))), provided that F(U(q)) = 1. 

2.2 Continuity of GS 

We say that a series of points qu on M converges to a point qo and denote 
Ik Qo if d(qk, qo) —* 0, where d is the geodesic distance on M. 
Proposition 1 Let F be a distribution on M and for q^ G M, k = 0, 1, 
we have q^ — > qo, as k —>■ oo, F(B(q k )) = 1 for all k and tr(GE(q )) < oo. 
Then tr(GE(q k )) -> tr(GE(q )). 

Proof. Without lost of generality we may assume that for all q k , d(q , q k ) < r. 
Let X be a random variable with distribution F. Define random variables = 
tr(G(q k )Z(q k ,X)) and £ = tr(G(q )Z(q , X)), where Z(q,p) = {qp){qp)' '. 
Observe that 

F(U k (M\B(q k ))) < - F(B(q k ))) = 

k 
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and therefore F{y\ k B(q k )) = 1. Since G and Z are continuous at go we have 
& Co, a.e.. For every p G n B(q ) 

tr(G(q k )Z(q k ,p)) = d 2 (q k ,p) < (d(q k ,q Q ) + d(q ,p)) 2 

< {r + d(q ,p)f < max{(r + l) 2 ,r 2 + (2r + l)d 2 (q ,p)} 

and thus 

< & < max{(r + l) 2 ,r 2 + (2r + l)f }, a.e.. 

Finally, since E^ < oo, the dominated convergence theorem gives us E(C, k ) — => 
E(£o), which is exactly the claim. □ 

Proposition 2 Under the conditions of proposition^ if v k G M gfc — > v G 
M go and w k G M 9j . — > w G M go , i/ien 

< v k , (GE(q k ))(w k ) >^< v , (G£(g ))Oo) > 

Proof. Define random variables 

Vk =< {G(q k )Z(q k ,X))(v k ) > 

and 

?7o =< w , (G(go)^(go, -X"))(«o) > • 
We have r\ k — > 770, a.e. and 

< ?7fc < ||'yfc||tr(G(g fc )Z(g fc ,X))||w fc ||. 

As in Proposition [TJ % are bounded by a random variable with a finite 
expectation. Therefore, again by dominated convergence theorem, E(j] k ) — > 
E(rj ). □ 

We say that covariance field E is continuous in the sense given by Propo- 
sition [21 i.e. GE is a continuous field of linear operators on tangent spaces 
of M. 

Definition 4 The covariance field E of a probability distribution F is contin- 
uous at q G M if for any two continuous vector fields v and w on M defined 
in a neighborhood of q, the function < v, (GY>)w > is continuous at q. 

Proposition [1] states sufficient conditions for continuity of E. We emphasize 
that continuity of GE may hold even when the density of F is discontinuous, 
provided the conditions are met. Moreover, even discrete distributions may 
have continuous covariance fields. 
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Proposition 3 Let F be a distribution on M, g G M such that tr(G£(g )) < 
oo and in a neighborhood B of go on M, F(B(q)) = 1, for all q G B . Then 
the eignevalues of GS are continuous functions at go- 

Proof. Let (x, U) be a parametrization about go, such that x(0) = go and 
x(£7) C -Bo- Let Xi(x), i = l,...,n, be the eigenvalues of GS(x(i)), x <E U. 
Consider the canonical tangent vectors Vi(x) = ^\ x ( x ), which form continu- 
ous vector fields on x(f). By Proposition [2J for all i and j 

<Vi,{GZ{ q )){vj) >= (■',■>:,(■', u 

are continuous functions at 0. Since the elements of matrix G x are continuous 
at 0, so are those of G X T, X . Therefore the eigenvalues Aj(x) are continuous 
at 0. □ 

2.3 Extended covariance fields 

In the course of our research we will find useful to extend the definition 
of covariance field to a whole class COV(F) of contra- variant tensor fields 
associated with a particular distribution F. Its members are all 

S(g;r)=/ {qp){qp)'r(\\qp\\)dF(p), (11) 

J M 

where r : M + — > R + is a continuous function. Thus, 

COV(F) = {S(g;r)|r G C(R + )}. 

The role of r-function in ( fTTT) is to control the 'amplitude', tr(GS), of the 
covariance field. It has analytical purpose that is important in numerical 
experiments. Fields with large amplitude are difficult to be analysed numer- 
ically and this fact matters when one develops computational algorithms. 
We will try to illustrate this with an example. 

Example 1 Let M = § 2 with the standard differential and metric structure 
(see details in Appendix S). For every p G M, T P M = M 2 and a normal 
neighborhood of p is S(p,n), the image of the circle C(0,n) C T P M under 
the exponential map, i.e. S(p,n) = Exp p (C(0, ir)). 

Let F be uniform distribution on M and y\,...,yk are samples drawn from 
F. What is the sample covariance field of F based on these samples? 



11 



First, we consider the generic covariance field 

k 



1=1 

For a second one we apply r(t) = (1 — ^) 2 , 

k 

s 2 (%) = ^(^)(^)'(1 



Lei (a,t) be polar coordinates on § 2 at p. Then dV(a,t) = sin{t)dctdt and 
the density is a constant, f(a,t) = l/(4ir). As usual with G we denote the 
metric tensor. First, we calculate 

E{HG{y j )&&) = \j t 2 sm{t)dt = y - 2, 



and 



while, 



,n 2 



E(tr(G(y j )X 1 (y j ))) = (k-l)(--2), 



2 



E{tr{G{ yj ){^{^'){\ 



and 



2 \\VjVi\\ 

2 

t 2 stn(t)(l-^) 2 dt=^-2 

E(tr(G(y j )E 2 ( Vj ))) = (k-l)(?j-2). 

Clearly, E(tr(G(y j )H 1 (yj))) > 6(fc - l)E(tr(G( yj )X 2 ( yj ))). 
□ 

Numerical experiments confirm the benefit of applying an amplitude 
bounding r-function. A typical choice for r, suggested by experiments, is 

r(t) = (1 "2M )2 (12) 

when the distribution has a bounded domain with geodesic radius R. It is a 
member of the family {r(t) = (1 — j^j ) 2 , & > 0}. 
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Recall that a geodesic radius of distribution F is the minimal R such that 
for every p G supp(F), supp(F) C Exp p (C(R)), where C(R) is the ball with 
radius R in tangent space at p, i.e. C(R) = {v G R n ||w| < R}. 

Next result provides sharper estimator of parameter a for the above family 
of r-functions based on the criterion tr(GX) to be minimal. But first we need 
following definition. 

For every point p G M, there are geodesic spherical coordinates (8, t) 
defined by 

p(8,t) = ExppitO), (9,t) G U(p) C S^ 1 x [0,oo), 

and Exp p (U (p)) = B(p), the maximal normal neighborhood of p. The change 
of coordinates at p from normal v to spherical (9, t) is dv = t n ~ 1 dddt. 

A distribution F is said to have a bounded density f, dF{p) = f{p)dV(p), 
if there exists a positive constant C such that for any p G M, if (6, t) are the 
spherical coordinates at p, then 

f(e,t)^\Gi^tj\<c,W9,teU(p). 

For example, on the unit 2-sphere, § 2 , 

y/\G(6,t)\d0dt = sin(t)d6dt, (9,t) e [0,2tt) x [0,tt). 

and any function / on S 2 such that C > / > 0, after rescaling, defines a 
distribution with bounded density. 

Lemma 3 Let T be a family of distributions on M with geodesic radius R 
and bounded by C density, then for 

r(t;a) = (l-^)\ (13) 

we have 

R 2 

sup sup tr{G{q)Z{q; a)) < V(§ n " 1 )C , J R(— - aR + a 2 ), 

FeTqeM 3 

where the covariance field a) is defined by (Tl\) with r(t,a) and "^(S™" 1 ) 
is the volume o/S™" 1 . 
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Proof. For q G M and spherical coordinate system (9,t) 6 U C§" 1 x [0, R] 
at p we have 

tr{G{q)E(q;a)) = / 1 1^| | 2 (1 - jr^) 2 dF(p) = 

Jm \\m\ 

[ t 2 (l-j) 2 f((9,t))^/\G((9,t))\d9dt. 
Ju 1 

From the bounded density assumption 

tr(G(g)£(g; a)) < C / t 2 (l - -) 2 d9dt < 
Ju t 

V(§ n ^)C [ R t 2 (l - -fdt = V(S ri - 1 )C J R(^ - a,R + a 2 ). 
Jot 3 

Moreover the minimum of the right hand side function of a is achieved for 
R/2. □ 

As a consequence of lemma [31 on S 2 , since all distributions have bounded 
geodesic radius R = ir, a = ir/2 is the optimal choice for the parameter in 
(ITS]) and thus, r(t) = (1 — 2pj) 2 i s ^ ne optimal member of the family ( [131) . 

3 Recovering distribution from covariance 

In this section we consider the problem of recovering a distribution from 
its covariance field. If such a recovery is not possible, then the covariance 
structure will not be a complete distribution representation and its applica- 
tion potential will be diminished. Fortunately, in most circumstances, the 
answer of the problem is positive. 

3.1 Similarity invariants 

The space Sym^ of symmetric and positive definite n x n matrices is a 
well studied manifold that accepts a Riemannian structure that makes it a 
symmetric space. For example, see A. Ohara, N. Suda, S. Amari (1996) and 
W. Forstner, B. Moonen (1999). We define an important class of functions 
on Sym+ that respects similarity operation and thus, are functions that can 
be applied on linear operators. 

Definition 5 A similarity invariant function on Sym^ is any continuous h 
that satisfies 
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(%) h(AXA', AY A') = h(X, Y), VX, Y E Sym+ and A £ GL n . 

It is a non-negative with a unique root if 

(ii) h(X, Y) > 0, VX, Y £ Sym+ and h(X, Y) = X = Y. 

Moreover, h is called similarity invariant distance, if in addition to (i) and 
(ii) also satisfies 

(in) h(X, Y) + h(Y, Z) > h(X, Z), VX, Y, Z G Sym+. 
We will denote with SXM. n the class of functions satisfying (i) and (ii). 
Below we list several examples of similarity invariant functions. 

1. For a fixed Z £ Sym^, the similarity invariant 

h trdif (X, Y; Z) = \{tr(Z- x X - Z^Y)], 

satisfies (iii) but not (ii). Default choice will be Z = G^ 1 , the inverse 
of the metric tensor representation. 

2. The second one is sometimes referred as affine-invariant distance in 
Sym^, see for example [3], [7] and [H], and it is defined by 

h trln2 (X,Y) = {tr{ln\XY- l ))yl\X,Y e Symj. 

Actually, h tr i n 2 is not a unique choice for a distance in Sym^. 

3. Log-likelihood function gives us another choice for h, 

h hk (X,Y) =tr(XY~ 1 )-ln\XY~ 1 \-n. 
It satisfies (i) and (ii) but it fails to satisfy the triangular inequality. 

4. 

h lntr (X,Y) = ln(tr(XY- 1 - FX" 1 ) 2 ), 
hi n tr is another candidate for a distance in Sym^. 

5. Another interesting choice for h is 

h lnpr (X,Y) = ln{tr{XY^)tr{YX~ 1 )), 

that satisfies (iii) and 'almost' satisfies (ii): hi npr (X, Y) = <^=^ X = 
cY, for c > 0. 

15 



Let F be a distribution on M and £ be its covariance field. Consider a 
similarity invariant h applied on covariance operator field GS. If the con- 
ditions of Proposition [3] hold, h{GH) would be a continuous scalar field on 
M. Indeed, h is a continuous function of the eigenvalues of GS, which are 
continuous by themselves. 

Scalar fields of the form h(GH) can be viewed as representations of F. For 
some choices of h, they would be true distribution representations, in sense 
that the underlying distribution can be fully recovered from them. And this 
is what we are going to address next. 

3.2 Recovering discrete distributions 

Let V = {pi}i=i be a set of k points on M. By a discrete mass function 
(pmf) on M we understand any / defined on the domain set V, such that 
f = {fi = f(pi) > and Eti/i = 1- We write / G P+, where P+ 

denotes the compact k-simplex. 

Let Q = {qj} k =1 be another set of k points on M, called observation set, 
where the covariances of / G will be considered. 

We assume that the set V is contained within the maximal normal neigh- 
borhood of each of the points qj in order for the vectors qjPi to be well defined. 
This assumption is not a strong one when M is a complete Riemannian man- 
ifold. Thus, for every j we may assume a fixed local parametrization (xj, Uj) 
such that V C Xj(Uj). 

Fix an amplitude controlling function r. Covariance of / G at qj is 
defined as 

k 

: = EL/ife.) = J2(m)(QM)'r(\\m\\)fi- 

i=l 

Let us denote 

Yji = (q~&)(q~&)' r (\\qjPi\\), i = 

then = Yli=i fiXji- The collection {S[/]j}* =1 is called a covariance set 
of / on Q. 

Now we are interested in the inverse problem. How to reconstruct a pmf 
from its observed covariances? Let C = {Cj G T 2 (M qj )} k j=1 be a set of contra- 
variant tensors, which may happen to be covariance tensors of an unknown 
distribution or may not. The problem is to find / such that 

E\f] j *C jl j = l...,k. 
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To measure the 'closeness' we will use similarity invariant. 
Define the functional 

1 k 

H U) = l Y. h w^ c ^f eP ^ ( 14 ) 

where h G STM.(n). Now we can formulate more precisely our problem as 
an optimization one: find a pmf f such that 

/ = argminfH(f). (15) 

From the assumptions for h, it is clear that 

H(f) = ME[/] i ,<7 i ) = ) Vj £[/]; = Q.Yj. 

If Cj are covariances that come from a pmf f° G P^ 1 ", then / will be a 
solution of the system 

k 

^2f i Y ji = C j ,j = l...k. (16) 

i=l 

To be able to recover correctly / , the system (fTBl) should have a unique 
solution. A necessary and sufficient condition for that is 

rank(y\C) = rank{y) = k, 

where y = y[q 3 ] := {d 3i = tr(G(^)>ii)}J=i,i=i» C = {tr{G(q 3 )C,)}) =1 and 
^V|C is the matrix y with vector C attached as a last column. Note that d 3i = 
d 2 (qj,pi)r(d(qj,pi)) and for r = 1 these are the squared geodesic distances. 
Definition 6 We say that a covariance operator field GH on M has a full 
rank, if for any bounded subset A C M, k G N and k-sample {qj}j =1> selected 
by a continuous distribution Q on A, we have Pq{rank{y) = k) = 1, where 

y = {d 2 (?j > P<)r(d(g,-,Pi))}t fc i ii= i. 

In Eulcidean space, M = IR n , the rank of {d 2 (qj,Pi)}*= li=1 is bounded 
above by n + 2, i.e. for a default covariance field in Euclidean space, 

rank(y) < n + 2, 
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and (1161) does not have a unique solution when k > n + 2. The problem can 
be fixed using a non-default covariance field with amplitude function r ^ 1 
in dHJ. 

We hypothesize that on non-Euclidean space, a manifold with non-zero 
curvature, any covariance operator field is of full rank. Experiments on 
spheres, manifolds with constant sectional curvature +1, and hyperbolic 
plane, a manifold with constant sectional curvature -1, confirm the hypoth- 
esis, but of course a more formal argument is needed here. 

If matrix y has full rank, then one can find the pmf f directly solving 
system (TT61) or minimizing the functional H(f) as given in (fl4|) for h E 
SXM. n . The second choice is much more general and gives us solutions even 
in cases when Cj are not in fact true covariances, i.e. rank(y\C) > rank(y). 

It is also important to know in what cases the optimization problem (|15p 
can be solved easily. A function h for which the corresponding functional H 
is convex is an obvious choice since in that case it is straightforward to find 
the global minimum of H by the gradient descend algorithm. 

Definition 7 We say that h E SIM. n is convex (in Sym^) if for any k and 

Yji,Cj E Sym+, i,j=l,...,k, such that rank(y) = k, Y%=i KJ2i=i fi Y w C i) 
is a convex function of f in . 

From the list of invariants we list above, h\ rdi p and h^ rsq are convex. 
We will show it for the last one. 

Example 2 Let Yji be such that rank(y = {tr(Yji)}j : i) = k. We will show 
that for hf rsq (A, B) := tr^AB' 1 - BA' 1 ) 2 ), the functional 

k k k 

H{f) = £ir((E^) - E^-r 1 ) 2 

j=l i=l i=l 

is convex. Without loss of generality we assume Ci = I n . Then 

■' S j = l 8=1 1 = 1 1=1 

and defining Bj = Yli=i fi"Yij> we obtain 

^ = 2j2tr[Y js (B j -Br% 

JS 3=1 
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Second derivatives are 
d 2 H 



dfsdfi 



2 HYsM + Y sj Br%Bj* + ):,/!/),, IS/ + Y sj Bj%B/}. 



For w G M. k and w^O, let Zj = Yl s =i z s~Ysj, th 



en 



w 



'|#«- = 2 y £tr{Z j Z j + ZiBj%BT* + Z 3 Bj 2 Z 3 Bj 2 + Z 3 BfZ 3 Bj 1 }. 

J 3=1 



Since rank(y) = k, Y^=i ^ r (^j^j) > 0- Also, 

tr{Z 3 B^Z 3 Bf) = tr^B^ZjBj 1 ] [B^ZjBj 2 ]') > 

and similarly tr(Z 3 Bj 2 Z 3 BJ 2 ) > 0. T7iat proves the convexity of h 2 rsq . □ 

Now we are interested in the problem of consistency of estimators ffl5l) . If 
we assume that the covariances C 3 are random and converge in probability to 
some matrices, is it true that the estimators / also converge? To guarantee 
a positive answer we need more assumptions for the invariant h. 

Theorem 1 Let h G SIM. n be a distance function. Let also f° G P^ 
be a pmf on V and {C° = S[/°](^-)}^ =1 be its covariance set on Q. If 
C m = {C™ G Sym+} k ]=l is a sequence of set of random matrices such that 

Vj = 1, k, h{C™, C°) — > p 0, asm^oo 

then for 

1 k 

/- = argmin feP +H m (f), where H m (f) = t ^ H^f],, Cf ), 

3=1 



we have f m — > p f c 



Proof. 

Since h satisfies the triangular inequality, then for any / G P£ 



\Kn\f\j, cif) - h(x[f}3, < HCf t c°) 
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and 

sup feP +\h{nfhCf) - h(nf]j:C°\ < h{Cf,C«). 
By summing on j we obtain 

k 

su PfeP +\H m (f)-Ho(f)\ < X>(Cf ,C?). 

Under the assumptions on h, Ho(f) has a well-separated minimum at 
f° G P^ ■ In fact H (f°) = 0. Therefore, for any 5 > 0, there exists e > 0, 
such that 

H (f) > H (f°) + 2e, for any f such that \f - /°| £a > 5. 

Then we have 

^(|/ m - /°U 2 > 5) < P(F (r) > H (f°) + 2e) < 

P(F (/ m ) - H m (f m ) + ^ m (/ m ) - H (f) > 2e) < 
since tf m (/ m ) < 

P(i7 (/ m ) " + flm(/°) - #o(/°) > 26) < 

P(2su PfeP +\H m (f) - #o(/)| > 2e) < 

fc k 

P(52h(C?,C°) > ke) <^P(h(Cp,q) > e). 

3=1 3=1 

Since for any j, P{h{Cf, Cj) > e) — ► 0, we have P(\f m - /°| £a > 5) — ► 0. 
□ 

Unfortunatelly, the above result is not very useful in practice for distance 
functions are usually non-convex and non-convexity of H makes its optimiza- 
tion difficult. In fact, the condition on h to satisfy the triangular inequality 
is a stronger assumption than what we actually need. We observe that it 
is only used to bound uniformly \h(E[f]j, Cf) - /i(E[/] y ,C°| by h{Cf,C]). 
Therefore, if we guarantee the uniform convergence of the former difference 
by other means, the triangular inequality condition will be redundant. 
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Consistency Criterion 1 We say that a similarity invariant function h 
satisfies the consistency criterion, if for C m G Sym^ random, and E>i,C° G 

Sym+, 

h(C m , C°) — > p 

implies 

k k 

sup\h(Y^m,C m )-h(J2^B t ,C°)\ ^ p 0, asm^oc. 

f i=l i=l 



The following theorem is a stronger version of Theorem [TJ but using the 
above consistency criterion, and can be proven similarly. 

Theorem 2 Let h G SIM. n satisfy consistency criterion [H Let also f° G 
P£ be a pmf on V and {C° = S[/°](gj)}j =1 be its covariance set on Q. If 
C m = {CJ 1 G Sym^}j =l is a sequence of set of random matrices such that 

Vj = 1, k, h(CJ\ C°) — > p 0, as m -> oo 

then f m ^ p f°. 

It turns out that ht r in2 invariant is a distance but is not convex in the 
sense of definition ((Tj) and finding the global minimum of H tr i n2 is difficult. 
On the other hand invariants huk and h^ rsq are both convex and satisfy the 
consistency criterion [TJ, which makes them better choices. 

The condition h(C™, C°) — > p can be further simplified if h is continu- 
ous. We say that a sequence X m of random nxn matrices converges in proba- 
bility to matrix C and write X m — > p C,if for any v G M™, v'(X m — C)v ^ p 0. 
One can easily check that if h(X, C) is continuous in X G Sym^ for every 
C G Sym+, then X m — > p C if and only if h(X m , C) ^> p 0. Next we have 
the following corollary of Theorem [2j 

Corollary 1 Let h G STM, n be a continuous invariant that satisfies consis- 
tency criterion^ If CJ 1 — > p C° ; Vj = 1, k, then f m — > p f°. 

Note that both huk and h? tr satisfy the conditions of the above corollary. 
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3.3 Recovering continuous distributions 

We will give a constructive procedure for recovering a continuous dis- 
tribution density from its covariance operator field. It turns out that any 
covariance field of full rank specifies completely the underlying density when 
its domain is a bounded compact. This fact shows that covariance fields are 
in general faithful representations of corresponding distributions. Although 
this fact may seem obvious at first look, there is a notable exception that 
makes the problem of recovering relevant. In Euclidean space, M = M. n , we 
have the following relation for the default covariance field of random variable 
X with E{X) = fi 

E[(q-x)(q-x)'} = E[{p-x)(p-x)'] + (q-p)(q-p)'+(q-p){p-fJ,y+(p-fJ,)(q-p)' ', 
Thus 

= + (q-fi)(q- fi)' 

does not contain any information beyond the first two moments: the mean 
fj, and the covariance Therefore the covariance field is defined only 

by the first two moments of X and can not possibly represent the whole 
distribution. 

We use the same approach as for recovering discrete distributions and 
start with selecting appropriate similarity invariants as technical instruments. 
We need to make stronger assumptions than that in consistency criterion (TjQ). 

Consistency Criterion 2 We say that similarity invariant function h sat- 
isfies the consistency criterion if for B il C\ 1 C2 G Sym^ such that \\Bi\\ < 7 
and \ \Ci\ I < 7 

k k 

sup \h(£ m, d) - h(J2 fiBi, C 2 )| < a 7 | \Ci - C 2 \\, 
f 1=1 1=1 

for a constant a > 0, independent of Bi and Ci. 

Example 3 We will show that h\ rdi f(A, B) = tr 2 (A — B) is convex and satis- 
fies the consistency criterion^ LetYji are such thatrank(y = {tr(Yji)}j t i) = 
k. Define L(f) = E - =1 (Eti " tr ( C j)f> then 

dL k k 
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and 



For w e 



2 J £tr(Y js )tr(Y jl ) 



dfsdfr 



w 'l^ w = 2^(5> s *r(F is )) 2 > 



0=1 s=l 

because rank(y) = k. This shows the convexity. 

Next, observe that for B iy C\ and C 2 , such that \ \Bi\ \ < 7, |]Cj|| < 7 



fMBi) - triC,)) 2 - (£ fMBi) ~ tr(C 2 )f 



< 



i=l i=l 



M^-MCz)! \2tr{^f i B i ) + tr(C 1 )+tr(C 2 )\<2n(n + l)<y\\Ci-C 2 \ 



i=l 



since for any n x n matrix X, \tr{X)\ < n\\X\\. □ 

In order to show our main result we need to put some restrictions on the 
size of distribution domains. In the sequel, d g denotes the geodesic distance 
corresponding to a metric g on M. 

Definition 8 We say that the inverse exponential map exp~ x on M is Lip- 
schitz in A C M, if there exists a constant (3 > such that Vg £ A 

\\qpi - <m\\L2 = Wexp^px - exp~ 1 p 2 \\ L 2 < (3d g (p 1 ,p 2 ),Vpi,p 2 e A. 

Necessary condition for exp~ x to be Lipschitz in A is for any q G A, A C U(q), 
where, recall, U(q) denotes the domain where exp~ l is defined and is in fact 
diffeo morphia It is an open question whether this condition is sufficient one. 
The next example provides some evidence in its support. 

Example 4 On the unit 2-sphere, M = S> 2 ; the exp~ x -map is Lipschitz in 
any compact K with geodesic diameter less than n. If p = diam(K) < n 
then parameter 3 = . p , > . 

r 1 sin(p) 
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Theorem 3 Let F be a distribution on M and let K be a compact in M, 
which is bounded, diam(K) < R, contains the support of F, F(K) = 1, and 
Log-map is Lipschitz in K. Let also the default covariance field E of F has 
a full rank a.e. on M. Then there is a sequence F m of pmfs on M obtained 
from the field GT> alone, such that VV C M 

F m (V) — ► F(V), asm^oo. 

Proof. Under the assumptions for K, for any m > there exists a partition 
{Uj n }^S^ of K with geodesic diameter no greater than 1/m and each point 

g jjrn j g selected independently by the uniform distribution on U™ . Then 
P{rank{y[qf]) = N(m)) = 1. 

Define ff = F(Uf) and 



N{m) 

z m (q)= E ^(<)(<)'?^- 



Note that E m (q r ) is random because it depends on the choice of qj 1 . Since 

N(m) 

E (<)(<)'- / (QP)(qp)'dF(p) = 



N(m) 

E / [(^f)(<)' - (QP)(qp)']dF(p), 



for any t> G M g we have 

|<i;,(GE m (g)-G£(g))(i;)>| = 

E / I < u ' 95? > 2 - < v ' W > 2 \ dF (p) < 
i=i Ju T 



qq™-qp\\ \\v\\ {\\qqf\\ + \\qp\\)dF{p) < 



v\\ 2 p(2R 2 + — ) — , w. p. 1. 

m m 
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Thus the operator norm of GE m (g) — GH(q) is uniformly bounded 

||GE m (g) </3(2R 2 +-)-, w. p. 1. (17) 

m ra 

In particular, since dim(M)=n 

max|tr(G£ m (g) - GE(q))\ < n(3(2R 2 + — ) — , w. p. I. 

qeA ra ra 

Since £ has a full rank a.e. on M, then there exists a/if SXAi n that is 
convex (in sense of definition [7j) and satisfies consistency criteria [2] (w. p. 1). 
For example, we may take h = h 2 rdi j. Define 

N(m) 
N(m) 

&mU) = Wm) £ MGX[/](C),GS™(gf)) 

and let 

f m = ar grain } H m (f) and /' m = ar grain f H m (f). 

Since H m (f m ) = 0, H m (f) has a well separated minimum at / m , i.e. for 
any 6 > 0, there exists e > 0, such that 

H m {f) > H m (f m ) + 2e = 2e, for any / such that \ f - f m \ L2 > 5. 

Therefore 

P(\f m - f m \L 2 >5)< P(H m (f m ) > H m (f m ) + 2e) = 

P(H m (f m ) - #m(/ m ) + ^m(/ m ) - ^m(/ m ) > 2e) < 

since H m (f m ) < H m (f m ) 

P(H m (f m ) - H m (f m ) + H m (f m ) - Hm{f m ) > 2e) < 
P(2 sup \H m (f) - H m (f)\ >2e) < 
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by consistency criteria [2] for h and the fact that ||(<ff>) || < nR 2 on K 

N(m) 

P{anR 2 —— \\GTT{qf) - GE(gf )|| > e) — ► 0, asm^oo 

by the virtue of (O). Thus / m - f m — > p 0. 
Finally, for any V G M we have 

]T jf— p F(V), asm^oo 

and therefore 

p m (v)= j2 fr—>pF(y)- 

y.qfdV 

□ 

The above theorem is constructive and give us the freedom to choose 
a similarity invariant h provided it is convex and satisfies the consistency 
criterion [2J As we showed, h 2 rdi j satisfies both requirements and can be 
applied. In this case the corresponding optimization functional 

N(m) 

HmU) = ivR E (£ - P(^ m )) 2 

is based on the function = Epd 2 {q,p) = f K d 2 (q,p)dF(p), the "variance" 
of F with respect to q. What the theorem states, basically, is that distribution 
F can be recovered from the scalar field p on M provided that the full-rank 
condition for the covariance field on M holds and in the domain of F, exp~ l - 
map is Lipschitz. 

The requirement for h to satisfy the consistency criterion [2] can be relaxed. 
We only need h G SXM. n to be continuous invariant that satisfies consistency 
criterion [TJ Indeed, looking back at the proof above we see that | \CE m (qJ l ) — 
GE(g™-)|| — > p 0, and therefore supj. gP + \H m (f) — H m (f)\ — > p 0, which is 

what we needed to show f m — f m — > p 0. That observation gives us more 
choices for h such as the convex invariants huk and h 2 r . 

In R n recovering from default covariance field is not possible because 
the full-rank condition fails. On non-Euclidean (with non-zero curvature) 
spaces however, the full-rank condition is generally true and reconstruction 
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is possible. For example, if M = S 2 and supp(F) C K, for a compact on 
§ 2 with diam(K) < it, then the theorem is applicable and F can surely be 
recovered. 

Some of the conditions of the above theorem can be relaxed. For example, 
as defined the full-rank condition for the covariance field allow infinitely many 
choices for the points q™ and thus infinitely many choices of sequence F m 
converging to F . Different invariants h also give different approximating 
distributions F m . 

Theorem [1] shows how to recover discrete distribution provided the full- 
rank condition only. Lipschitz condition in theorem [3] is a technical one and it 
might be possible to relax it in a different approach to the problem. Another 
opportunity, for example, is to work with a covariance field with amplitude 
r / 1 and to satisfy both full-rank condition and Lipschitz condition, even 
in R n . 

3.4 Summary 

We introduced the concept of covariance field of a distribution on a Rie- 
mannian manifold. It is a contra-variant 2-tensor field on the manifold. 
Closely associated with it is a covariance operator field, which defines a lin- 
ear operator on the tangent space at each point on the manifold. Covariance 
operator fields, in most cases, are continuous. 

Covariance fields, in general, recover the underlying distributions and 
in this sense are faithful distribution representations. There is one major 
exception though where such reconstruction is not possible, the Euclidean 
space MJ 1 . This interesting fact shows that the covariance field concept is 
indeed more relevant to non-Euclidean manifolds. 
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